Before pre-processing:
Pre-processed Data:
The statistic tf-idf (term frequency - inverse document frequency) is intended to measure how important a word is to a document in a collection (or corpus) of documents.
The inverse document frequency for any given term is defined as
\[ idf\text{(term)}=\frac{n_{\text{documents}}}{n_{\text{documents containing term}}} \]
Parties want the media agenda to be congruent with their own agenda to define the issue-based criteria on which they will be evaluated by voters ( Eberl, 2017 ).Thus, parties instrumentalize their press releases in order to highlight issues that they are perceived to be competent on, that they “own” and that are important to their voters ( Kepplinger & Maurer, 2004 ). Editors can select from this universe and decide which of these topics will be discussed in the news. In that sense the ideological content of a newspaper refers to the extent to which the topics promoted by the parties correlate with the topics discussed in the news articles.
To discover the latent topics in the corpus of press releases (1.942) and news articles (11.880), a structural topic modeling (STM) developed by Roberts (2016) is applied. The STM is an unsupervised machine learning approach that models topics as multinomial distributions of words and documents as multinomial distributions of topics, allowing to incorporate external variables that effect both, topical content and topical prevalence.
STM assumes a fixed user-specified number of topics. There is not a “right” answer to the number of topics that are appropriate for a given corpus (Grimmer and Stewart 2013). Roberts et al. (2016) propose to measure topic quality through a combination of semantic coherence and exclusivity of words to topics. Semantic coherence is a criterion developed by Mimno et al. (2011) and is closely related to pointwise mutual information (Newman et al. 2010): it is maximized when the most probable words in a given topic frequently co-occur together.
Using the function searchK several automated tests are performed to help choose the number of topics including the average exclusivity and semantic coherence as well as the held out likelihood (Wallach et al. 2009) and the residuals (Taddy 2012).
I included the document source as a control for the topical topical prevalence, assuming that the distribution of topics depends on the sources. The number of topics is set to 50.
library(stm)
library(tidyverse)
library(ggthemes)
library(xtable)
library(viridis)
rm(list = ls())
color <- "#b7b7b7"
color1 <- "#778899"
color2 <- "#808080"
color3 <- "#000000"
source("func/functions.R")
load("../output/models/finalmodel_50.RDa")
k <- stmOut$settings$dim$K
model_df <- model_df %>%
dplyr::mutate(doc_index = as.numeric(rownames(.)),
source = ifelse(source == "welt.de", "DIE WELT", source),
source = ifelse(source == "zeit.de", "ZEIT ONLINE", source),
source = ifelse(source == "focus.de", "FOCUS Online", source),
source = ifelse(source == "bild.de", "Bild.de", source),
source = ifelse(source == "spiegel.de", "SPIEGEL ONLINE", source),
source = ifelse(source == "union", "Union", source),
source = ifelse(source == "spd", "SPD", source),
source = ifelse(source == "afd", "AfD", source),
source = ifelse(source == "gruene", "Grüne", source),
source = ifelse(source == "linke", "Linke", source),
source = ifelse(source == "fdp", "FDP", source)
)
stmOut$settings$call
## stm(documents = out$documents, vocab = out$vocab, K = k, prevalence = ~source,
## content = ~type, data = out$meta, init.type = "Spectral",
## max.em.its = 75)
model_df %>%
group_by(type) %>%
tally()
## # A tibble: 2 x 2
## type n
## <chr> <int>
## 1 news 15135
## 2 press 2666
To explore the words associated with each topic we use the words with the highest probability in each topic. As we included the source type (press release or news paper) as a control for the topical content (the word distribution of each topic), we have two different labels for each topic.
sagelabs <- sageLabels(stmOut, 20)
for (i in seq(k)) {
name <- paste0("topic_label",i)
png(paste0('../figs/topiclabel/',name,'.png'), width = 400, height = 450)
plotQuote(
c(paste(sagelabs$covnames[1],":",
paste(sagelabs$cov.betas[[1]]$problabels[i,], collapse="\n")),
paste(sagelabs$covnames[2],":",
paste(sagelabs$cov.betas[[2]]$problabels[i,], collapse="\n"))
)
)
dev.off()
}
theta <- as.data.frame(stmOut$theta) %>% # get all theta values for each document
mutate(doc_index = as.numeric(rownames(.))) %>%
# convert to long format
gather(topic, theta, -doc_index) %>%
mutate(topic = as.numeric(gsub("V","",topic))) %>%
# join with topic df
left_join(., topics.df, by="topic") %>%
# join with model_df
left_join(., model_df %>%
select(date,type,source,doc_index,title_text), by="doc_index")
# select a random document
doc <- sample(unique(theta$doc_index),1)
sample <- theta %>% filter(doc_index == doc)
caption <- model_df %>% filter(doc_index == doc) %>% select(title, source)
sample %>%
ggplot(aes(joint_label, theta)) +
geom_col(fill = color1) +
coord_flip() +
ylim(c(0,1)) +
theme_hc() +
labs(x = NULL, y = NULL, caption = paste("title:",caption$title,"(",caption$source,")"))
ggsave("../figs/doc_topic_distr.png", height = 8, width = 8)
The expected proportion of the corpus that belongs to each topic is used to get an initial overview of the results. The figure below displays the topics ordered by their expected frequency across the corpus. The four most frequent words in each topic are used as a label for that topic.
overall_freq <- as.data.frame(colMeans(stmOut$theta)) %>%
transmute(
topic = as.numeric(rownames(.)),
frequency = colMeans(stmOut$theta)
) %>%
left_join(., topics.df, by = "topic") %>%
arrange(desc(frequency))%>%
mutate(order = row_number())
overall_freq %>%
ggplot(aes(reorder(joint_label, -order),frequency)) +
geom_col(fill = color1) +
coord_flip() +
scale_fill_gradient(low = color2, high = color1) +
theme_hc() +
labs(x=NULL, y=NULL)
ggsave("../figs/topic_proportion.png", height = 10, width = 10)
Agendas were measured in terms of percentage distributions across the 50 topics. For each source the average distribution of each topic is calculated for each month.
topicmean_monthly <- theta %>%
mutate(
year = lubridate::year(date),
month = lubridate::month(date)
) %>%
filter(month != 5) %>%
group_by(topic, joint_label, source,type, month, year) %>%
summarise(topicmean = mean(theta, na.rm = T)) %>%
ungroup() %>%
mutate(date = as.Date(paste0(year,"/",month,"/1")))
topicmean_news <- theta %>%
filter(type == "news") %>%
group_by(topic,joint_label,source) %>%
summarise(topicmean = mean(theta, na.rm = T)) %>%
ungroup()
topicmean_press <- theta %>%
filter(type == "press") %>%
group_by(topic,joint_label, source) %>%
summarise(topicmean = mean(theta)) %>%
ungroup()
topicmean_news %>%
group_by(source) %>%
arrange(desc(topicmean), .by_group = TRUE) %>%
mutate(topic_order = row_number()) %>%
ungroup() %>%
group_by(joint_label) %>%
mutate(topicmean_mean = mean(topicmean)) %>%
ungroup() %>%
top_n(70, topicmean_mean) %>%
ggplot(aes(reorder(joint_label, topicmean_mean),
topicmean, label = topic_order,
fill = topic_order)) +
geom_col(show.legend = F) +
geom_text(hjust=-0.1, size=5) +
coord_flip() +
scale_fill_gradient(low = color1, high = color3) +
facet_wrap(~source, nrow = 1) +
labs(x=NULL, y=NULL) +
theme(axis.text.y = element_text(size=12))
topicmean_press %>%
group_by(source) %>%
arrange(desc(topicmean), .by_group = TRUE) %>%
mutate(topic_order = row_number()) %>%
ungroup() %>%
group_by(joint_label) %>%
mutate(topicmean_mean = mean(topicmean)) %>%
ungroup() %>%
top_n(50, topicmean_mean) %>%
ggplot(aes(reorder(joint_label, topicmean_mean),
topicmean, label = topic_order,
fill=topic_order)) +
geom_col(show.legend = F) +
geom_text(hjust=-0.1, size=5) +
coord_flip() +
scale_fill_gradient(low = color1, high = color3) +
facet_wrap(~source, nrow = 1) +
labs(x=NULL, y=NULL) +
theme(axis.text.y = element_text(size=12))
For each source \(s\), we get a matrix \(\Theta_s\) as the collection of all documents (collection of column vectors \(\theta_{dk}\)).
\[ \Theta_s = \begin{bmatrix} \theta_{1} & ... & \theta_{d} \\ . & . & . \\ . & . & . \\ \theta_{k} & . & . \\ \end{bmatrix} \]
where \(\theta_{j}\) is the \(j\)-th column of \(\Theta_s\) for $j {1, … , d } $. The \(k × 1\) vector \(\theta_j\) gives the \(j\)-th document’s probability for the \(k\) topic.
E.g. for “DIE WELT” the following matrix of document-topic distributions is given:
theta %>%
filter(source == "DIE WELT") %>%
filter(doc_index %in% seq(1,50)) %>%
select(doc_index, topic, theta) %>%
mutate(theta = round(theta, 3)) %>%
spread(doc_index,theta) %>% select(-topic) %>%
htmlTable::htmlTable()
| 1 | 2 | 3 | 4 | 5 | 12 | 14 | 15 | 38 | 40 | 42 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.031 | 0.007 | 0.002 | 0.028 | 0.035 | 0.074 | 0.001 | 0.036 | 0 | 0 | 0.001 |
| 2 | 0.013 | 0.001 | 0.046 | 0.015 | 0.02 | 0.156 | 0 | 0.035 | 0.001 | 0.001 | 0.001 |
| 3 | 0.019 | 0 | 0.001 | 0.022 | 0.014 | 0.008 | 0 | 0.008 | 0 | 0 | 0.001 |
| 4 | 0.001 | 0 | 0.001 | 0.001 | 0.001 | 0.002 | 0 | 0.001 | 0 | 0 | 0.008 |
| 5 | 0.001 | 0.001 | 0 | 0.001 | 0.001 | 0.001 | 0 | 0.002 | 0.001 | 0.002 | 0 |
| 6 | 0.006 | 0 | 0 | 0.007 | 0 | 0.04 | 0 | 0.001 | 0 | 0 | 0 |
| 7 | 0.008 | 0.002 | 0.001 | 0.009 | 0.009 | 0.003 | 0 | 0.013 | 0 | 0 | 0.001 |
| 8 | 0.029 | 0 | 0.005 | 0.022 | 0.022 | 0.006 | 0 | 0.013 | 0 | 0 | 0.001 |
| 9 | 0.049 | 0.025 | 0.002 | 0.049 | 0.06 | 0.138 | 0.024 | 0.073 | 0 | 0 | 0.001 |
| 10 | 0.011 | 0.109 | 0.014 | 0.008 | 0.009 | 0.034 | 0.124 | 0.014 | 0.001 | 0.001 | 0 |
| 11 | 0.021 | 0 | 0.002 | 0.011 | 0.009 | 0.004 | 0 | 0.009 | 0 | 0 | 0.004 |
| 12 | 0 | 0 | 0 | 0 | 0 | 0.013 | 0 | 0 | 0 | 0 | 0 |
| 13 | 0.006 | 0.003 | 0 | 0.006 | 0.011 | 0.001 | 0 | 0.006 | 0 | 0 | 0 |
| 14 | 0.294 | 0 | 0 | 0.341 | 0.395 | 0.034 | 0 | 0.55 | 0 | 0 | 0.008 |
| 15 | 0 | 0 | 0.008 | 0 | 0 | 0.006 | 0 | 0 | 0 | 0 | 0 |
| 16 | 0.014 | 0.008 | 0.001 | 0.013 | 0.005 | 0.003 | 0 | 0.01 | 0 | 0 | 0 |
| 17 | 0.001 | 0.001 | 0.002 | 0.011 | 0 | 0.027 | 0.001 | 0 | 0 | 0 | 0.03 |
| 18 | 0 | 0.001 | 0.003 | 0.001 | 0.001 | 0.001 | 0 | 0.001 | 0 | 0 | 0.011 |
| 19 | 0.005 | 0.001 | 0 | 0.004 | 0.005 | 0.001 | 0 | 0.006 | 0 | 0 | 0.001 |
| 20 | 0 | 0.001 | 0 | 0.001 | 0.001 | 0.004 | 0 | 0 | 0 | 0 | 0 |
| 21 | 0.003 | 0.002 | 0 | 0.002 | 0.002 | 0.003 | 0.006 | 0.005 | 0 | 0 | 0.001 |
| 22 | 0.001 | 0 | 0.769 | 0 | 0 | 0.001 | 0 | 0 | 0.985 | 0.983 | 0.001 |
| 23 | 0.03 | 0 | 0.005 | 0.037 | 0.037 | 0.001 | 0 | 0.024 | 0 | 0 | 0.003 |
| 24 | 0 | 0.007 | 0.001 | 0 | 0 | 0 | 0.007 | 0 | 0.002 | 0.002 | 0 |
| 25 | 0 | 0 | 0 | 0 | 0 | 0.002 | 0 | 0 | 0 | 0 | 0 |
| 26 | 0.101 | 0 | 0.005 | 0.065 | 0.047 | 0.006 | 0 | 0.022 | 0 | 0 | 0.015 |
| 27 | 0.009 | 0.749 | 0 | 0.011 | 0.016 | 0.001 | 0.735 | 0.02 | 0 | 0 | 0 |
| 28 | 0.231 | 0 | 0 | 0.227 | 0.219 | 0 | 0 | 0.054 | 0 | 0 | 0 |
| 29 | 0 | 0.007 | 0 | 0 | 0 | 0 | 0.005 | 0 | 0 | 0 | 0 |
| 30 | 0 | 0 | 0 | 0 | 0 | 0.001 | 0 | 0 | 0 | 0 | 0.321 |
| 31 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.004 | 0.004 | 0 |
| 32 | 0 | 0 | 0 | 0 | 0 | 0.083 | 0 | 0 | 0 | 0 | 0.002 |
| 33 | 0.007 | 0.001 | 0.082 | 0.005 | 0.002 | 0.095 | 0.001 | 0.002 | 0.001 | 0.001 | 0.131 |
| 34 | 0.057 | 0.001 | 0.011 | 0.04 | 0.015 | 0 | 0 | 0.013 | 0 | 0 | 0.413 |
| 35 | 0 | 0 | 0 | 0 | 0 | 0.017 | 0 | 0 | 0.001 | 0.001 | 0 |
| 36 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 37 | 0.005 | 0.001 | 0 | 0.006 | 0.008 | 0.005 | 0.003 | 0.013 | 0 | 0 | 0.003 |
| 38 | 0.007 | 0 | 0 | 0.009 | 0.011 | 0.038 | 0 | 0.011 | 0 | 0 | 0 |
| 39 | 0.001 | 0.036 | 0 | 0.001 | 0.002 | 0.046 | 0.048 | 0 | 0 | 0 | 0 |
| 40 | 0 | 0 | 0.001 | 0 | 0.001 | 0.003 | 0 | 0 | 0 | 0 | 0 |
| 41 | 0.01 | 0 | 0.016 | 0.01 | 0.007 | 0.001 | 0 | 0.005 | 0 | 0 | 0.001 |
| 42 | 0.002 | 0.031 | 0.009 | 0.004 | 0.001 | 0.004 | 0.032 | 0.002 | 0 | 0 | 0.002 |
| 43 | 0.001 | 0.002 | 0.006 | 0 | 0 | 0.072 | 0.006 | 0 | 0 | 0 | 0.009 |
| 44 | 0 | 0 | 0 | 0 | 0 | 0.024 | 0 | 0 | 0 | 0 | 0.001 |
| 45 | 0 | 0 | 0 | 0 | 0 | 0.008 | 0 | 0 | 0 | 0 | 0.023 |
| 46 | 0 | 0 | 0.001 | 0 | 0 | 0.012 | 0 | 0 | 0.001 | 0.001 | 0 |
| 47 | 0 | 0 | 0 | 0 | 0.001 | 0.001 | 0 | 0.001 | 0 | 0 | 0 |
| 48 | 0.023 | 0 | 0 | 0.029 | 0.032 | 0.001 | 0 | 0.049 | 0 | 0 | 0 |
| 49 | 0 | 0 | 0 | 0 | 0 | 0.017 | 0 | 0 | 0 | 0 | 0.003 |
| 50 | 0 | 0 | 0 | 0 | 0 | 0.002 | 0 | 0 | 0 | 0 | 0 |
—> Group by source an topic: The mean for each topic is given by
\[ \bar{ \theta_{i} } = \sum^d_{j=1}\theta_{ij} \]
where $i {1, … , k } $
# calculate topic mean by source and month
topicmean <- theta %>%
mutate(
year = lubridate::year(date),
month = lubridate::month(date)
) %>%
filter(month != 5) %>%
group_by(topic,source) %>%
dplyr::summarise(topicmean = mean(theta)) %>%
ungroup() %>%
spread(source, topicmean)
theta %>%
mutate(
year = lubridate::year(date),
month = lubridate::month(date)
) %>%
filter(month != 5) %>%
group_by(topic,source) %>%
dplyr::summarise(topicmean = mean(theta)) %>%
ungroup() %>%
mutate(topicmean = round(topicmean,4)) %>%
spread(source, topicmean) %>%
select(-topic) %>%
htmlTable::htmlTable()
| AfD | B90/GRÜNE | Bild.de | CDU | DIE LINKE | DIE WELT | FDP | FOCUS Online | SPD | SPIEGEL ONLINE | stern.de | tagesschau.de | ZEIT ONLINE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.0176 | 7e-04 | 0.0508 | 0.007 | 0.0065 | 0.0376 | 0.0077 | 0.049 | 0.0177 | 0.0486 | 0.0385 | 0.0378 | 0.0545 |
| 2 | 0.0146 | 0.0063 | 0.038 | 0.0014 | 0.004 | 0.0212 | 0.0088 | 0.0285 | 0.0019 | 0.0362 | 0.0165 | 0.0184 | 0.0299 |
| 3 | 0.0111 | 0.0398 | 0.0243 | 0.0016 | 0.0081 | 0.025 | 0.0146 | 0.0231 | 0.0043 | 0.027 | 0.0326 | 0.0166 | 0.0339 |
| 4 | 0.0077 | 0.0062 | 0.0643 | 0.0067 | 0.0178 | 0.0504 | 0.0301 | 0.055 | 0.0095 | 0.0551 | 0.0362 | 0.0509 | 0.0573 |
| 5 | 0.0059 | 0.0021 | 0.0125 | 0.0048 | 0.0027 | 0.0011 | 0.006 | 8e-04 | 0.005 | 0.0023 | 0.0023 | 0.0012 | 0.0026 |
| 6 | 0.0209 | 0.0579 | 0.0148 | 0.0505 | 0.0388 | 0.0073 | 0.0499 | 0.0057 | 0.0288 | 0.0116 | 0.0118 | 0.034 | 0.0158 |
| 7 | 0.0169 | 3e-04 | 0.0167 | 0.0073 | 0.006 | 0.0161 | 0.0103 | 0.0106 | 0.0093 | 0.0102 | 0.0114 | 0.0148 | 0.0092 |
| 8 | 0.0079 | 0.0017 | 0.0262 | 4e-04 | 0.0143 | 0.0178 | 0.0023 | 0.0205 | 0.0052 | 0.024 | 0.0174 | 0.0204 | 0.0213 |
| 9 | 0.0185 | 0.0024 | 0.0301 | 0.0043 | 0.0061 | 0.0284 | 0.0113 | 0.0279 | 0.01 | 0.0279 | 0.0326 | 0.0179 | 0.0281 |
| 10 | 0.0151 | 0.0211 | 0.0186 | 0.0422 | 0.0307 | 0.0143 | 0.0142 | 0.0102 | 0.0065 | 0.0355 | 0.0125 | 0.0097 | 0.0111 |
| 11 | 0.005 | 0.014 | 0.0311 | 0.0241 | 0.0014 | 0.0248 | 0.0288 | 0.0328 | 0.0199 | 0.0301 | 0.0225 | 0.0258 | 0.0352 |
| 12 | 0.0151 | 0.0235 | 0.0059 | 0.0063 | 0.0281 | 0.0182 | 0.0013 | 0.0203 | 0.0086 | 0.0088 | 0.019 | 0.0124 | 0.0094 |
| 13 | 0.0141 | 0.0033 | 0.0064 | 0.0056 | 0.0098 | 0.0164 | 0.0048 | 0.0131 | 0.0041 | 0.0065 | 0.0062 | 0.0037 | 0.0084 |
| 14 | 0.0187 | 0.0074 | 0.0431 | 0.0153 | 0.0125 | 0.0552 | 0.0119 | 0.0382 | 0.0101 | 0.0374 | 0.0507 | 0.0292 | 0.0358 |
| 15 | 0.011 | 0.0044 | 0.0145 | 0.0089 | 0.0061 | 0.0072 | 0.0131 | 0.0049 | 0.0194 | 0.0047 | 0.0024 | 0.0068 | 0.0048 |
| 16 | 0.0062 | 0.0036 | 0.01 | 0.0121 | 0.0083 | 0.0097 | 0.0034 | 0.0101 | 0.0176 | 0.0087 | 0.0127 | 0.0085 | 0.0122 |
| 17 | 0.0472 | 0.0195 | 0.0271 | 0.0305 | 0.0514 | 0.0181 | 0.0888 | 0.0164 | 0.0353 | 0.0204 | 0.0352 | 0.042 | 0.028 |
| 18 | 0.0014 | 0.0039 | 0.0092 | 0.0117 | 0.0024 | 0.013 | 0.0019 | 0.0088 | 0.0095 | 0.0086 | 0.0124 | 0.0117 | 0.0096 |
| 19 | 0.1081 | 0.0019 | 0.0267 | 2e-04 | 7e-04 | 0.0265 | 5e-04 | 0.0197 | 0.002 | 0.0202 | 0.025 | 0.0161 | 0.0288 |
| 20 | 0.0455 | 0.0193 | 0.0119 | 0.0114 | 0.0342 | 0.0515 | 0.0303 | 0.0589 | 0.0115 | 0.0246 | 0.0261 | 0.0444 | 0.0117 |
| 21 | 0.012 | 0.0096 | 0.0104 | 0.0606 | 0.0141 | 0.0082 | 0.0173 | 0.006 | 0.0479 | 0.0099 | 0.0116 | 0.0121 | 0.0086 |
| 22 | 0.0071 | 0.0144 | 0.013 | 0.0379 | 0.01 | 0.0102 | 0.019 | 0.0171 | 0.0274 | 0.0179 | 0.0206 | 0.0079 | 0.0124 |
| 23 | 0.0043 | 4e-04 | 0.051 | 0.0028 | 5e-04 | 0.0506 | 0.2278 | 0.0377 | 5e-04 | 0.0461 | 0.0422 | 0.0412 | 0.0538 |
| 24 | 0.0453 | 0.0299 | 0.0161 | 0.0327 | 0.097 | 0.0146 | 0.0193 | 0.0197 | 0.0113 | 0.0315 | 0.0231 | 0.0219 | 0.0145 |
| 25 | 0.0101 | 0.0257 | 0.0183 | 0.0434 | 0.0151 | 0.0183 | 0.0058 | 0.0233 | 0.0507 | 0.0096 | 0.01 | 0.0192 | 0.0076 |
| 26 | 0.0108 | 0.0016 | 0.0264 | 0.0014 | 0.0038 | 0.0275 | 0.0099 | 0.0255 | 0.0057 | 0.028 | 0.0281 | 0.0261 | 0.0345 |
| 27 | 0.0201 | 0.0234 | 0.0093 | 0.0223 | 0.0436 | 0.0215 | 0.0152 | 0.031 | 0.0085 | 0.0419 | 0.0674 | 0.0151 | 0.0194 |
| 28 | 0.0465 | 0.0046 | 0.0237 | 2e-04 | 6e-04 | 0.0191 | 6e-04 | 0.0203 | 0.0012 | 0.0268 | 0.0268 | 0.0126 | 0.0352 |
| 29 | 0.0135 | 0.0187 | 0.0147 | 0.003 | 0.0246 | 0.0207 | 0.0082 | 0.0217 | 0.002 | 0.0171 | 0.0202 | 0.0269 | 0.0229 |
| 30 | 0.0123 | 0.0205 | 0.0145 | 0.0032 | 0.0413 | 0.0066 | 0.0151 | 0.0075 | 0.0234 | 0.0103 | 0.0076 | 0.03 | 0.0105 |
| 31 | 0.0133 | 0.0028 | 0.0072 | 0.0041 | 0.0021 | 0.0034 | 0.0033 | 0.0064 | 0.0047 | 0.0059 | 0.006 | 0.0068 | 0.0029 |
| 32 | 0.0138 | 0.0109 | 0.0223 | 0.0299 | 0.0233 | 0.0404 | 0.0117 | 0.0308 | 0.0341 | 0.0161 | 0.0296 | 0.0365 | 0.0217 |
| 33 | 0.0085 | 0.027 | 0.0407 | 0.0592 | 0.0125 | 0.0441 | 0.0104 | 0.0399 | 0.0445 | 0.0378 | 0.0192 | 0.0315 | 0.0546 |
| 34 | 0.0327 | 0.023 | 0.0246 | 0.0126 | 0.0125 | 0.0243 | 0.005 | 0.0261 | 0.0114 | 0.0323 | 0.023 | 0.0309 | 0.0346 |
| 35 | 0.0126 | 4e-04 | 0.0137 | 0.0158 | 0.001 | 0.0112 | 0.0014 | 0.0094 | 0.0011 | 0.0039 | 0.0034 | 0.0065 | 0.0036 |
| 36 | 0.0114 | 0.0269 | 0.0105 | 0.0405 | 0.0291 | 0.0255 | 0.0169 | 0.0476 | 0.054 | 0.0284 | 0.0299 | 0.0525 | 0.0096 |
| 37 | 0.005 | 0.0051 | 0.0094 | 0.0024 | 0.0105 | 0.0095 | 0.0038 | 0.0119 | 0.0071 | 0.0155 | 0.0127 | 0.0096 | 0.0164 |
| 38 | 0.0125 | 0.0018 | 0.033 | 0.0017 | 0.0014 | 0.0056 | 0.0062 | 0.0105 | 0.0028 | 0.0045 | 0.004 | 0.0041 | 0.0042 |
| 39 | 0.0544 | 0.0839 | 0.0118 | 0.0308 | 0.0369 | 0.0207 | 0.064 | 0.0181 | 0.0305 | 0.0276 | 0.0263 | 0.0177 | 0.0186 |
| 40 | 0.0571 | 0.0056 | 0.0083 | 0.0019 | 0.0032 | 0.0147 | 0.0011 | 0.0135 | 3e-04 | 0.0109 | 0.012 | 0.0091 | 0.0156 |
| 41 | 0.0041 | 0.004 | 0.0151 | 0.0366 | 0.0052 | 0.0149 | 0.0032 | 0.0149 | 0.0051 | 0.0156 | 0.0127 | 0.0078 | 0.0182 |
| 42 | 0.0185 | 0.0599 | 0.0116 | 0.0144 | 0.0198 | 0.0082 | 0.0213 | 0.0098 | 0.0279 | 0.0121 | 0.0167 | 0.0171 | 0.0181 |
| 43 | 0.0343 | 0.1337 | 0.0183 | 0.1098 | 0.061 | 0.0263 | 0.0443 | 0.0128 | 0.1003 | 0.0167 | 0.0188 | 0.0339 | 0.022 |
| 44 | 0.0648 | 0.001 | 0.014 | 0.0191 | 0.0072 | 0.0259 | 0.0165 | 0.0182 | 0.0032 | 0.0142 | 0.0123 | 0.0186 | 0.0253 |
| 45 | 0.0067 | 0.0025 | 0.0048 | 0.0133 | 0.0149 | 0.0128 | 0.0177 | 0.0141 | 0.0151 | 0.0217 | 0.0235 | 0.0111 | 0.0234 |
| 46 | 0.0215 | 0.0319 | 0.0279 | 0.0299 | 0.0075 | 0.0216 | 0.0033 | 0.0224 | 0.0223 | 0.0362 | 0.0073 | 0.0167 | 0.0079 |
| 47 | 0.0161 | 0.0096 | 0.0049 | 0.0021 | 0.0127 | 0.0023 | 4e-04 | 0.0039 | 0.0172 | 0.0029 | 0.0184 | 0.0038 | 0.0045 |
| 48 | 0.0017 | 7e-04 | 0.0126 | 1e-04 | 4e-04 | 0.0021 | 0.0016 | 0.0043 | 5e-04 | 6e-04 | 8e-04 | 5e-04 | 0.0036 |
| 49 | 0.0155 | 0.1754 | 0.0276 | 0.1149 | 0.1922 | 0.0264 | 0.0868 | 0.0161 | 0.2006 | 0.0073 | 0.0359 | 0.0472 | 0.0276 |
| 50 | 0.0051 | 0.0059 | 0.0019 | 0.0012 | 0.0058 | 0.0035 | 0.0025 | 0.0046 | 0.0027 | 0.0021 | 0.0058 | 0.0031 | 0.001 |
Next, I estimate bivariate correlations between party agendas (press releases) and the mediated party agendas in the online news. These correlations represent the agenda selectivity each party experiences in each media outlet. The higher the correlation, the more congruent both agendas are.
E.g. for “CDU” we get the following correlations:
media <- unique(model_df %>% filter(type == "news") %>% select(source))
parties <- unique(model_df %>% filter(type == "press") %>% select(source))
cor(topicmean[,media$source], topicmean[,"CDU"]) %>% htmlTable::htmlTable()
| CDU | |
|---|---|
| DIE WELT | 0.111393975907849 |
| stern.de | 0.104683612695906 |
| ZEIT ONLINE | 0.035627367692623 |
| FOCUS Online | -0.009682026527168 |
| Bild.de | 0.00426245147510798 |
| SPIEGEL ONLINE | -0.00539349393457622 |
| tagesschau.de | 0.373370887419879 |
# calculate topic mean by source and month
topicmean <- theta %>%
mutate(
year = lubridate::year(date),
month = lubridate::month(date)
) %>%
group_by(topic,source, month, year) %>%
dplyr::summarise(topicmean = mean(theta)) %>%
ungroup() %>%
spread(source, topicmean) %>%
filter(month != 5)
media <- unique(model_df %>% filter(type == "news") %>% select(source))
parties <- unique(model_df %>% filter(type == "press") %>% select(source))
rm(corrDF)
for (i in parties$source) {
tempdf <- topicmean %>%
group_by(month, year) %>%
do(data.frame(Cor=t(cor(.[,media$source], .[,i])))) %>%
gather(medium, cor, 3:9) %>%
mutate(party = i,
medium = gsub("Cor.","",medium)) %>%
ungroup()
if (exists("corrDF")){
corrDF <- rbind(corrDF,tempdf)
} else {
corrDF <- tempdf
}
}
agenda <- corrDF %>%
mutate(date = as.Date(paste0(year,"/",month,"/1")),
cor_norm = normalize_data2(cor)
) %>%
dplyr::mutate(medium = ifelse(medium == "DIE.WELT", "DIE WELT", medium),
medium = ifelse(medium == "ZEIT.ONLINE", "ZEIT ONLINE", medium),
medium = ifelse(medium == "FOCUS.Online", "FOCUS Online", medium),
medium = ifelse(medium == "SPIEGEL.ONLINE", "SPIEGEL ONLINE", medium)
) %>%
filter(month != 5)
agenda %>%
ggplot(aes(date, cor,
color = medium,
linetype = medium)) +
geom_line() +
geom_hline(yintercept = 0, size = 0.3, color = color1) +
facet_wrap(~party) +
theme_hc() +
scale_color_viridis_d() +
labs(y=NULL, x =NULL) +
scale_x_date(date_breaks = "1 month", date_labels = "%b-%y") +
theme(legend.position = "bottom",
legend.title = element_blank()) +
guides(col = guide_legend(nrow = 1))
#plotly::ggplotly(p, tooltip=c("cor","medium"))
ggsave("../figs/corr_timeline.png", height = 6, width = 10)
# normalized between -1 and 1
agenda_norm <- agenda %>%
select(medium, cor, party, date) %>%
spread(key = party, value = cor) %>%
mutate(
AfD = normalize_data2(AfD),
`B90/GRÜNE` = normalize_data2(`B90/GRÜNE`),
CDU = normalize_data2(CDU),
`DIE LINKE` = normalize_data2(`DIE LINKE`),
FDP = normalize_data2(FDP),
SPD = normalize_data2(SPD)
) %>%
gather(key = party, value = cor, AfD:SPD)
ggplot(agenda_norm,
aes(date, cor,
color = medium,
linetype = medium)) +
geom_line() +
geom_hline(yintercept = 0, size = 0.3, color = color1) +
facet_wrap(~party) +
theme_hc() +
scale_color_viridis_d() +
labs(y=NULL, x =NULL) +
theme(legend.position = "bottom",
legend.title = element_blank()) +
guides(col = guide_legend(nrow = 1))
#plotly::ggplotly(p, tooltip=c("cor","medium"))
ggsave("../figs/corr_diff_timeline.png", height = 6, width = 10)
radar <- agenda %>%
group_by(party, medium) %>%
summarize(cor = mean(cor, na.rm = T)) %>%
spread(key = party, value = cor)
ggiraphExtra::ggRadar(radar, aes(color = medium),
rescale = F,
alpha = 0) +
theme_hc() +
scale_color_viridis_d() +
theme(legend.position = "right",
#axis.text.y = element_blank(),
#axis.ticks.y = element_blank(),
legend.title = element_blank()) +
guides(col = guide_legend(ncol = 1))
ggsave("../figs/radar1.png", width = 16, height = 9, dpi = 120)
Due to political relevance, not all potential topics recieve equal amounts of coverage in media. However, these factors should influence all media outlets equally. To what extent does the topic correlation of a party in a medium differ from the average topic correlation in the media? To calculate the relative topic correlation, I estimate the deviation of the topic correlation of a party in one medium from the average topic correlation of that party over all news paper.
ggiraphExtra::ggRadar(radar, aes(color = medium),
alpha = 0) +
theme_hc() +
scale_color_viridis_d() +
theme(legend.position = "none",
legend.title = element_blank()) +
guides(col = guide_legend(ncol = 1))
ggsave("../figs/radar_diff.png", width = 16, height = 9, dpi = 120)